快速移动受试者的运动模糊是摄影中的一个长期问题,由于收集效率有限,尤其是在弱光条件下,在手机上非常常见。尽管近年来我们目睹了图像脱毛的巨大进展,但大多数方法都需要显着的计算能力,并且在处理高分辨率照片的情况下具有严重的局部动作。为此,我们根据手机的双摄像头融合技术开发了一种新颖的面部脱毛系统。该系统检测到主题运动以动态启用参考摄像头,例如,最近在高级手机上通常可用的Ultrawide Angle摄像机,并捕获带有更快快门设置的辅助照片。虽然主镜头是低噪音但模糊的,但参考镜头却很锋利,但嘈杂。我们学习ML模型,以对齐和融合这两张镜头,并在没有运动模糊的情况下输出清晰的照片。我们的算法在Google Pixel 6上有效运行,每次拍摄需要463毫秒的开销。我们的实验证明了系统对替代单片,多帧,面部特异性和视频脱张算法以及商业产品的优势和鲁棒性。据我们所知,我们的工作是第一个用于面部运动脱毛的移动解决方案,在各种运动和照明条件下,在数千个图像中可靠地工作。
translated by 谷歌翻译
尽管神经辐射场(NERF)在新型视图合成方面表现出了令人印象深刻的进步,但大多数方法通常需要具有准确的相机姿势的同一场景的多个输入图像。在这项工作中,我们试图将输入实质上减少到单个未予以的图像。现有的方法在本地图像功能上有条件重建一个3D对象,但通常会在远离源视图的视点处进行模糊的预测。为了解决这个问题,我们建议利用全球和本地功能形成表现力的3D表示。全局功能是从视觉变压器中学到的,而本地功能则从2D卷积网络中提取。为了综合一种新型视图,我们训练以学习的3D表示条件进行量渲染的多层感知器(MLP)网络。这种新颖的3D表示允许网络重建看不见的区域,而无需执行对称或规范坐标系等约束。我们的方法只能从单个输入图像中渲染新视图,并使用单个模型在多个对象类别中概括。定量和定性评估表明,所提出的方法可实现最先进的绩效,并使细节比现有方法更丰富。
translated by 谷歌翻译
视频博客和自拍照是流行的社交媒体格式,通常由广角相机捕获,以显示人类受试者和扩展的背景。遗憾的是,由于透视投影,靠近角落和边缘的面孔表现出明显的扭曲,延伸并挤出面部特征,导致视频质量差。在这项工作中,我们展示了一种视频扭曲算法来纠正这些扭曲。我们的主要思想是在面部地区本地应用立体投影。我们使用空间 - 时间能量最小化配制网眼翘曲问题,并使用线路保存术语最小化背景变形,以维持背景中的直边。为了解决时间一致性,我们通过潜在变量限制了翘曲网格和面部轨迹上的时间平滑度。对于性能评估,我们开发了具有广泛焦距的广角视频数据集。用户学习表明,83.9%的用户更喜欢基于透视投影的其他替代方案的算法。
translated by 谷歌翻译
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image superresolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.
translated by 谷歌翻译
With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection.
translated by 谷歌翻译
Model counting is a fundamental problem which has been influential in many applications, from artificial intelligence to formal verification. Due to the intrinsic hardness of model counting, approximate techniques have been developed to solve real-world instances of model counting. This paper designs a new anytime approach called PartialKC for approximate model counting. The idea is a form of partial knowledge compilation to provide an unbiased estimate of the model count which can converge to the exact count. Our empirical analysis demonstrates that PartialKC achieves significant scalability and accuracy over prior state-of-the-art approximate counters, including satss and STS. Interestingly, the empirical results show that PartialKC reaches convergence for many instances and therefore provides exact model counting performance comparable to state-of-the-art exact counters.
translated by 谷歌翻译
Robots are traditionally bounded by a fixed embodiment during their operational lifetime, which limits their ability to adapt to their surroundings. Co-optimizing control and morphology of a robot, however, is often inefficient due to the complex interplay between the controller and morphology. In this paper, we propose a learning-based control method that can inherently take morphology into consideration such that once the control policy is trained in the simulator, it can be easily deployed to robots with different embodiments in the real world. In particular, we present the Embodiment-aware Transformer (EAT), an architecture that casts this control problem as conditional sequence modeling. EAT outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired robot embodiment, past states, and actions, our EAT model can generate future actions that best fit the current robot embodiment. Experimental results show that EAT can outperform all other alternatives in embodiment-varying tasks, and succeed in an example of real-world evolution tasks: stepping down a stair through updating the morphology alone. We hope that EAT will inspire a new push toward real-world evolution across many domains, where algorithms like EAT can blaze a trail by bridging the field of evolutionary robotics and big data sequence modeling.
translated by 谷歌翻译
Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.
translated by 谷歌翻译
Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
translated by 谷歌翻译
Graphene quantum dots provide a platform for manipulating electron behaviors in two-dimensional (2D) Dirac materials. Most previous works were of the "forward" type in that the objective was to solve various confinement, transport and scattering problems with given structures that can be generated by, e.g., applying an external electrical field. There are applications such as cloaking or superscattering where the challenging problem of inverse design needs to be solved: finding a quantum-dot structure according to certain desired functional characteristics. A brute-force search of the system configuration based directly on the solutions of the Dirac equation is computational infeasible. We articulate a machine-learning approach to addressing the inverse-design problem where artificial neural networks subject to physical constraints are exploited to replace the rigorous Dirac equation solver. In particular, we focus on the problem of designing a quantum dot structure to generate both cloaking and superscattering in terms of the scattering efficiency as a function of the energy. We construct a physical loss function that enables accurate prediction of the scattering characteristics. We demonstrate that, in the regime of Klein tunneling, the scattering efficiency can be designed to vary over two orders of magnitudes, allowing any scattering curve to be generated from a proper combination of the gate potentials. Our physics-based machine-learning approach can be a powerful design tool for 2D Dirac material-based electronics.
translated by 谷歌翻译